home *** CD-ROM | disk | FTP | other *** search
-
- BIGSORT(tm)
- Version 1.02
-
- Written By: David Sheppard Poor
-
- Copyright (C) 1993: MeadowBrook Industries, Ltd
- ALL RIGHTS RESERVED
-
-
- DESCRIPTION: A utility to sort files:
- * Sorts small to extremely large files
- * Customized and complex key structures
- * FAST!
- * Fixed length and variable length records
- * Supports comma and other delimited file formats
- * Supports dBASE and compatible file formats
- * Up to eight input files with a single output file
-
- ON-LINE HELP: Enter: BigSort /?
-
- EXECUTION:
- BigSort InputFile OutputFile [Format [Key[+Key2[+Key3..]]]]
-
- Note that only the first two parameter fields are required.
-
-
- INPUT FILE: Should contain all the data you want to sort. If an input
- file is not in the current directory, explicitly give the path
- (e.g. C:\DATA\DATAFILE.TXT). Up to eight files can be combined as
- multiple input files, with the output going to a single output
- file. Wildcards (? and *) may be used, and multiple file
- specifications can be combined with a plus (+) immediately between
- each input file spec (e.g. C:\USER\YOU\*.DAT+C:\DATA\*.DAT). Note
- that if the format of the input file is dBASE, there can only be
- one input file.
-
- OUTPUT FILE: All data from the input file(s) will be written to this file.
-
- FORMAT: Describes the format of the input and output data files.
- CRLF: Standard variable length records. (Default)
- Custom Variable: Any variable length records. (See "FORMAT")
- FIXED: Standard fixed length records.
- Custom Fixed: Any fixed length records. (See "FORMAT")
- COMMA: Standard comma delimited records.
- Custom Delimited: Any delimited records. (See "FORMAT")
- dBASE: For dBASE III and IV records.
-
- KEY: The key defines what part of each record will be used to sort the
- records. The key is made up of segments of the form:
- RecordPosition[(FieldLength)][/A|/D][/S|/I][/L|/R]
- Each segment consists of the following elements:
- RecordPosition Starting position in the record.
- FieldLength Number of byte for the key segment.
- Switches Sets case sensitivity and ordering.
-
- FORMAT: The Format tells BigSort(tm) what kind of input to expect. There are
- four basic flavors:
- Fixed: All records are exactly the same number of bytes. This is
- generally the most speed-efficient record format.
- Variable: Records may differ in length, but each ends with a
- characteristic "end-of-record" marker. This is the default.
- Delimited: Variable length records, where each part of the record
- is split into fields. The delimiter separates these fields.
- dBASE: Records compatible with dBASE III or dBASE IV.
-
- FIXED: Specifies the number bytes in each record is the same, and each
- record has an end-of-record marker of CR (carriage return) followed
- by LF (line feed). BigSort(tm) will automatically figure out the
- length of the Fixed records in the input file. To use, enter the
- format parameter as be "FIXED". Since sorting fixed length records
- is faster than variable length records, specify FIXED records
- instead of CR/LF whenever applicable.
-
- Custom Fixed: Specifies the number of bytes per record, which includes
- the end-of-record marker (e.g. if the record ends in CR/LF, be sure
- to include the two bytes in the record length). To use, set the
- format to be a decimal number, such as "200".
-
- CR/LF: Specifies the precise CR and LF byte(s) which mark the end of
- each record. To use, set the format parameter to be CR, LF, CRLF,
- LFCR, etc. The CR/LF end of record marker can be used for fixed
- length records and/or variable length records. Note that many word
- processors, editors, etc. use CRLF to denote the end of a line.
-
- Custom Variable: Specifies the precise byte(s) used to mark the end of
- each record, in HEX form. This format begins with a pound sign (#)
- and is followed by the hex representation of the bytes denoting the
- end of each record. This form can be used for the same purpose as
- CR/LF (where CRLF would be represented as #0D0A) as well as for
- more specialized markers (e.g. #00 for null-terminated records).
-
- COMMA: Specifies the records are comma delimited. This means each field
- of the record is separated by a comma, text fields are enclosed by
- double quotes, and each record is terminated by CRLF. To use,
- enter the format parameter as "COMMA". Note that this is the type
- of data file commonly created by programs written in BASIC.
-
- Custom Delimited: For delimited records, other than standard COMMA
- delimited. This format is comprised of a dollar sign ($), the hex
- representations of the delimiter and text marker, a colon (:), and
- a end-of-record marker. For example, the standard COMMA delimited
- format can be represented as "$2C22:0D0A", where the delimiter is a
- comma, the text marker is a double quote, and the end-of-record
- marker is CRLF. This format is seldom used.
- dBASE: For records compatible with the dBASE III or IV file formats.
- Note that only one dBASE file can be sorted at a time. To use,
- enter the format as "dBASE".
-
- KEYS: The key specifies how the file should be sorted. The default key is
- "1(100)/L/A/S", i.e. the key starts in position 1, is 100 bytes long, is
- read from left to right, is in ascending order, and is case sensitive.
-
- If a key is specified, each segment of the key is specified by fields in
- the form: KeyPos[(KeyLen)][/L|/R][/A|/D][/S|/I]. Each segment can be up
- to 255 bytes, and the total length of all segments may not exceed 1024
- bytes. When a segment is specified, the first field (the KeyPos) is
- required, and the others may be omitted if the defaults are appropriate.
-
- KEYPOS: For Fixed and Variable records:
- The segment's starting position in the record. (ie.if the
- KeyPos is 10, the segment starts at the 10th byte.)
- For Delimited and dBASE records:
- The field number, where the first field is number 1. (ie. if
- the KeyPos is 10, the segment would be the 10th field.)
-
- KEYLEN: For Fixed and Variable records:
- Number of bytes for the key segment. If the records are of
- variable length and the key goes beyond the end of the record,
- #00 is automatically used to fill in the blank space.
- For Delimited records:
- In the case of numeric fields, always set the KeyLen to 0.
- For non-numeric fields, KeyLen refers to the number of bytes
- to use from the field, not including the text markers.
- For dBASE records:
- Do not specify a KeyLen. the length of the segment will be
- set automatically, as defined by the input dBASE file.
- NOTE: If a KeyPos is given without a KeyLen, the KeyLen defaults to 1.
-
- SEQUENCE: The /L and /R switches denote Left-to-Right and Right-to-Left
- ordering. The default is /L. Use /R when, for example, you are sorting
- a binary word, and you need to reverse the order of the hi and lo bytes.
- (e.g. 20(2)/R). Note that the KeyPos is still the left-most byte when
- using the /R switch. (e.g. the key will use byte 21 followed by 20).
-
- ORDER: The /A and /D switches denote ascending or descending order. The
- default is /A. Use /D when, for example, you want the largest number to
- come first in the list. (eg. 10(3)/D). Note that in the example the
- first three digits starting in position 10 are used.
-
- CASE SENSITIVITY: The /S and /I switches denote Case Sensitive and Case
- Insensitive (or Ignore Case) ordering. The default is /S. Using a key
- of 1(15)/I will put Michigan before MISSISSIPPI. The key 1(15)/S will
- put MISSISSIPPI before Michigan. Be sure to always use the /S switch
- when sorting binary numbers. Also, using the /I may degrade the
- efficiency of bigsort(tm) up to 25%. See "Suggestions and Technical
- Notes" below for more on how to improve efficiency.
-
- SEGMENTS: The key may be broken into segments when the sorting criteria is
- spread throughout the record. To do this, put a plus (+) between each
- segment. Note the second segment will only be used if the first
- segments match exactly. (e.g. A key such as 100(10)+200(5) will sort
- starting at position 100 for 10 bytes. If two records have the exact
- same 10 bytes, the 5 bytes at position 200 will serve as a tie breaker.
- EXAMPLES: THE BASICS:
-
- There are three sample files included. They are:
- Sample.TXT Standard ASCII version of the sample data
- Sample.DEL Comma delimited version of the sample data
- Sample.DBF dBASE version of the sample data.
-
- The sample data includes records of states names, capitals, their postal
- abbreviations, and state populations.
-
- To look at the input file, type:
- TYPE SAMPLE.TXT | MORE
- Note that the "|" character is not a colon; it is a vertical bar. If
- you are working on a portable computer keyboard, you may not have access
- to this character.
-
- After trying an example, look at the output by typing:
- TYPE OUT.DAT | MORE
- To look at the output of the dBASE samples, you must use a database
- package.
-
- 1) To sort by state name, use:
- BigSort SAMPLE.TXT OUT.DAT (Defaults)
- -or- BigSort SAMPLE.TXT OUT.DAT CRLF 1(14) Format = CRLF
- -or- BigSort SAMPLE.TXT OUT.DAT FIXED 1(14) Format = Fixed
- -or- BigSort SAMPLE.TXT OUT.DAT 43 1(14) Format = Fixed 43
- The output from any of these will be the same. Note that the fixed
- formats are much faster. Since the default is CRLF with a key of
- 100, it is often good to specify FIXED when you can.
-
- 2) To sort by population, use:
- BigSort SAMPLE.TXT OUT.DAT FIXED 34(8)
- Note that numbers can be sorted when they are right justified.
-
- 3) When the populations are the same, use state names to break the tie:
- BigSort SAMPLE.TXT OUT.DAT FIXED 34(8)+1(14)
- Note: when two keys match exactly for the whole key, the one which
- appeared first in the input file will appear first in the output
- file.
-
- 4) If you have two files with the same format, they can be sorted together.
- Since there is only one text example file included, we will use it twice
- for demonstration purposes:
- BigSort SAMPLE.TXT+SAMPLE.TXT OUT.DAT FIXED 1(14)
- The output will have two records for each state.
-
- EXAMPLES: COMMA AND dBASE:
-
- Both dBASE and delimited forms are based on fields, as opposed to fixed
- positions.
-
- 1) To sort by state name, use:
- BigSort SAMPLE.DEL OUT.DAT COMMA 1(14)(Delimited Only)
- -or- BigSort SAMPLE.DBF OUT.DBF dBASE 1 (dBASE Only)
-
- 2) To sort by state population, use:
- BigSort SAMPLE.DEL OUT.DAT COMMA 4 (Delimited Only)
- -or- BigSort SAMPLE.DBF OUT.DBF dBASE 4 (For dBASE Only)
- Note: By not specifying a length in the delimited field, BigSort(tm)
- knows to treat the field as a number.
-
- 3) If there were a second file (SAMPLE2.DEL), use:
- BigSort SAMPLE*.DEL OUT.DAT COMMA 1(14)
- -or- BigSort SAMPLE.DEL+SAMPLE2.DEL OUT.DAT COMMA 1(14)
- Note: dBASE files can not be sorted together. You can, however, sort
- two dBASE files separately.
-
-
- EXAMPLES: SWITCHES:
-
- 1) If the states are sometimes all in capitals and sometimes starting with
- a single capital followed by lower case letters, use:
- BigSort SAMPLE.TXT OUT.DAT FIXED 1(14)/I
- Note: The /I switch turns off case sensitivity (case Insensitive) and
- allows upper and lower case words to be sorted together. Without
- the /I switch, they would not be sorted properly. Note, however,
- that the /I switch slows down processing by about 25%. Also, never
- use the /I switch with binary numbers.
-
- 2) To sort by state name from "Z" to "A", use:
- BigSort SAMPLE.TXT OUT.DAT FIXED 1(14)/D/I
- Note: The /D switch stands for Descending ordering.
-
- 3) To sort by state name within descending population (99999 -> 00000),
- use:
- BigSort SAMPLE.TXT OUT.DAT FIXED 34(8)/D+1(14)/I
-
- 4) There are some cases in which you might want to reverse the order of the
- characters. This is especially true when sorting binary numbers. As an
- example, you might want to sort the state postal code, so that the first
- letter counts as the second digit and vice-versa:
- BigSort SAMPLE.TXT OUT.DAT FIXED 17(1)+16(1)
- -or- BigSort SAMPLE.TXT OUT.DAT FIXED 16(2)/R
-
- SUGGESTIONS AND TECHNICAL NOTES:
-
- A) If the records are known to be of fixed length and that length is known,
- use the FIXED or Record Length format. BigSort(tm) is fastest with
- fixed length records, performing about 25% better than the default CR/LF
- variable record length format.
-
- B) Avoid using the /I switch when it is not necessary. It can slow down
- processing by 25%.
-
- C) To maximize its efficiency, have as much RAM available as possible when
- running; the less swapped to disk, the better.
-
- D) To further improve efficiency, send the output to a different volume
- (another hard disk) if available. This prevents the drive from going
- back and forth from the input and output files, which can add up to lots
- of time. DO NOT use a floppy disk, as they are VERY slow.
-
- E) Make sure there are sufficient FILES available, as defined in your
- CONFIG.SYS. Generally only about 4 FILES are necessary. However, in
- the worst case BigSort(tm) requires many more FILES. Make sure your
- CONFIG.SYS sets FILES=15 or more.
-
- F) The maximum length all the input files is 2,147,483,647 bytes (2GB).
- The maximum length of each record is 65535 bytes (64KB). There can be
- up to eight key segments, for a total maximum length of 1024 bytes.
- Each segment can be up to 255 bytes long.
-
- G) Don't use the name of the input file for the output file. If you have a
- type-o or the power goes out, you could lose both the original and
- sorted data!
-
- H) Keep the key as short as possible. For instance, if one part of the key
- is a number which is unique to each record, don't include any additional
- key segments. Also avoid large sections of white space in the key.
-
- I) Here is a neat trick: If you have a long text file, and you want to sort
- groups of records, place an extra blank line (CR/LF) after each group.
- Then enter: BigSort <InputFile> <OutputFile> CRLFCRLF <SomeKey>. This
- tells BigSort(tm) that the group of records is a single variable-length
- record. This works provided no group of records exceeds 64k.
-
- DEFINITION OF SHAREWARE:
-
- Shareware distribution gives users a chance to try software before
- buying it. If you try a Shareware program and continue using it, you are
- expected to register. Individual programs differ on details -- some
- request registration while others require it, some specify a maximum
- trial period. With registration, you get anything from the simple right
- to continue using the software to an updated program with printed
- manual.
-
- Copyright laws apply to both Shareware and commercial software, and the
- copyright holder retains all rights, with a few specific exceptions as
- stated below. Shareware authors are accomplished programmers, just like
- commercial authors, and the programs are of comparable quality. (In both
- cases, there are good programs and bad ones!) The main difference is in
- the method of distribution. The author specifically grants the right to
- copy and distribute the software, either to all and sundry or to a
- specific group. For example, some authors require written permission
- before a commercial disk vendor may copy their Shareware.
-
- Shareware is a distribution method, not a type of software. You should
- find software that suits your needs and pocketbook, whether it's
- commercial or Shareware. The Shareware system makes fitting your needs
- easier, because you can try before you buy. And because the overhead is
- low, prices are low also. Shareware has the ultimate money-back
- guarantee -- if you don't use the product, you don't pay for it.
-
- THE ASSOCIATION OF SHAREWARE PROFESSIONALS:
-
- BigSort(tm) is produced by a member of the Association of Shareware
- Professionals (ASP). ASP wants to make sure that the shareware principle
- works for you. If you are unable to resolve a shareware-related problem
- with an ASP member by contacting the member directly, ASP may be able to
- help. The ASP Ombudsman can help you resolve a dispute or problem with
- an ASP member, but does not provide technical support for members'
- products. Please write to the ASP ombudsman at 545 Grover Road,
- Muskegon, MI 49442-9427 USA, FAX 616-788-2765 or send a CompuServe
- message via CompuServe Mail to ASP Ombudsman 70007,3536.
-
- _______
- ____|__ | (R)
- --| | |-------------------
- | ____|__ | Association of
- | | |_| Shareware
- |__| o | Professionals
- -----| | |---------------------
- |___|___| MEMBER
-
-
- DISCLAIMER - AGREEMENT:
-
- Users of BigSort(tm) must accept this disclaimer of warranty:
- "BigSort(tm) is supplied as is. The author disclaims all warranties,
- expressed or implied, including, without limitation, the warranties of
- merchantability and of fitness for any purpose. The author assumes no
- liability for damages, direct or consequential, which may result from
- the use of BigSort(tm)."
-
- BigSort(tm) is a "shareware program" and is provided at no charge to the
- user for evaluation. Feel free to share it with your friends, but
- please do not give it away altered or as part of another system. The
- essence of "user-supported" software is to provide personal computer
- users with quality software without high prices, and yet to provide
- incentive for programmers to continue to develop new products. If you
- find this program useful and find that you are using BigSort(tm) and
- continue to use BigSort(tm) after a reasonable trial period, you must
- make a registration payment of $20 to MeadowBrook Industries, Ltd. The
- $20 registration fee will license one copy for use on any one computer
- at any one time. You must treat this software just like a book. An
- example is that this software may be used by any number of people and
- may be freely moved from one computer location to another, so long as
- there is no possibility of it being used at one location while it's
- being used at another. Just as a book cannot be read by two different
- persons at the same time.
-
- Commercial users of BigSort(tm) must register and pay for their copies
- of BigSort(tm) within 30 days of first use or their license is
- withdrawn. Site-License arrangements may be made by contacting
- MeadowBrook Industries, Ltd.
-
- Anyone distributing BigSort(tm) for any kind of remuneration must first
- contact MeadowBrook Industries, Ltd at the address below for
- authorization. This authorization will be automatically granted to
- distributors recognized by the (ASP) as adhering to its guidelines for
- shareware distributors, and such distributors may begin offering
- BigSort(tm) immediately (However MeadowBrook Industries, Ltd must still
- be advised so that the distributor can be kept up-to-date with the
- latest version of BigSort(tm)).
-
- You are encouraged to pass a copy of BigSort(tm) along to your friends
- for evaluation. Please encourage them to register their copy if they
- find that they can use it. All registered users will receive a copy of
- the latest version of the BigSort(tm) system and printed documentation.
-
- SUPPORT:
-
- If you have any questions or problems concerning the use of BigSort(tm),
- please write to the address below. Alternately, if you are a user of
- email, write to the email address listed below for a faster response.
- There are no duration restrictions placed on BigSort's support services.
- When requesting support, be sure to include the version number, as well
- as a complete description of the problem.
-
- REGISTRATION:
-
- Once you have determined BigSort(tm) is a tool you intend to use, please
- register your copy:
-
- SINGLE MACHINE LICENSE: $ 20.00
-
- MULTIPLE SINGLE USER MACHINE LICENSES: $ 15.00 per machine
- (2+ computers)
-
- NETWORKED MACHINE LICENSES / SINGLE SERVER: $ 10.00 per node
- (4+ computers)
-
- After receiving your registration fee, MeadowBrook will send your
- registered copies of the latest version of BigSort(tm) with printed
- documentation, and upgrade information as it becomes available. Be sure
- to include your address and to specify either 3 1/2 or 5 1/4 inch disks.
- If you prefer product information to arrive via email rather than
- conventional mail, please include your email address. Please see
- REGISTER.DOC for more information.
-
-
- CONTACTING MEADOWBROOK INDUSTRIES:
-
- If you have any questions, suggestions, or wish to register BigSort(tm),
- write to:
-
- BigSort(tm)
- MeadowBrook Industries, Ltd.
- 450 Veterans Drive
- Burlington, NJ 08016
-
- Or send email to: BigSort@Poor.Pgh.PA.US
-
-
- BIGSORT-PLUS:
-
- BigSortPlus(tm) is now available for $ 49.00 on a commercial basis. It
- contains all the functionality of BigSort(tm) in an interactive
- environment:
-
- * Interactive version, with pull-down menus
- * On-line, context-sensitive help
- * Comprehensive manual included
-
- If you would prefer to purchase BigSortPlus(tm) instead of registering
- BigSort(tm), either purchase from your local software store or from the
- address above. Please specify either 3 1/2 or 5 1/4 inch disks.
-
- SINGLE MACHINE COPY OF BigSortPlus(tm): $ 49.00
-
- Discounts are available for networked systems and multiple single user
- systems. Please contact MeadowBrook Industries for more information.
-
-